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ABSTRACT 



A method for permitting software optimization tools, soft- 
ware instrumenting tools and other analysis tools to re-write 
executables having mixed instructions and data uses a data 
structure having an entry for each multi-bit word in an 
executable file. Each entry of the data structure includes a 
number of flags that are set to identify the type of the 
multi-bit word in the associated line of the executable file. 
The types include instruction, data and unclassified. Each 
entry also includes a flag that indicates that the multi-bit 
word should not be optimized and a flag indicating that the 
multi-bit word is a problem branch. The no-optimize and 
problem branch flags may be used to identify multi-bit 
words that may be either branch instructions or data, and to 
ensure that such multi-bit words are not affected by optimi- 
zation or other rewriting of the executable. In addition, a 
problem fall through flag is provided to maintain program 
flow for possible fall through code segments. 
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MECHANISM FOR RE- WRITING AN 
EXECUTABLE HAVING MIXED CODE AND 
DATA 

FIELD OF THE INVENTION 

This invention relates generally to the field of computer 
software and more particularly to software optimization and 
analysis tools. 

BACKGROUND OF THE INVENTION 

As is known in the art, computers operate using a com- to 
bination of hardware and software. The software controls 
how the computer hardware functions to produce a desired 
result. The software that is provided on a computer system 
includes an operating system and may include one or more 
software applications. The operating system controls the 15 
basic operations of the computer hardware and presents a 
framework for allowing other software applications to inter- 
face with the hardware. Known operating systems include 
the VMS operating system, MS DOS, Windows 95, Win- 
dows NT, or UNIX, among others. 2Q 

Another software application that is often included in the 
software of the computer system is a compiler. The compiler 
is a software routine that translates software applications, 
written in a high level code such as C++, into an executable 
file capable of being run on the computer hardware. The 25 
executable file includes a set of instructions and data for 
implementing the software application on the computer 
hardware. The set of instructions comprises instructions 
from an instruction set of the computer. For example, a 
reduced instruction set computer (RISC) has an instruction 30 
set that includes simple operations, such as load and store 
operations. In general, the order of instructions in the 
executable file mirrors the order of instructions in the 
software application listing; with each instruction of the 
software application being translated into one or more 35 
corresponding instructions from the instruction set of the 
computer. 

Once an executable file of the software application has 
been generated, the software application may be executed on 
the computer system. Generally, the executable file is stored 4 q 
in a memory of the computer system, such as main memory, 
disk drive or other such device. As the software application 
is executed, portions are moved from an external memory 
into a local memory referred to as a cache memory, The 
cache memory is a relatively fast memory that is used for 45 
temporary storage of instructions and data that are to be 
executed on a processor of the computer hardware. By 
providing fast access to the instructions and data, the cache 
memory helps to improve the performance of the software 
application by reducing the delay incurred when retrieving 50 
instructions and data from memory. 

Each time that an instruction or data is required for 
operation of the software application, if the instruction or 
data is not stored in the cache it must be fetched from the 
memory. Because of the delays associated with retrieving 55 
data from memory, it is desirable to ensure that those . 
instructions that are to be executed frequently are stored in " } 
the cache. 

Optimization tools have been provided to improve the 
performance of software applications by re-arranging the 60 
order of instructions in the executable files to maximize 
cache usage. Re-arranging the order of the instructions in the 
executable files may be done to group together frequently 
executed instructions such that the group may be forwarded 
to the cache in one operation. Optimization tools may also 65 
re-arrange the order of instructions for a variety of other 
purposes. 
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One problem encountered by optimization tools is that the 
executable files provided by different compilers of different 
operating systems frequently have different formats. Some 
operating system compilers, such as the Windows NT 
compiler, interleave instructions and data to ensure that data 
that is needed by instructions is also moved to the cache 
when the associated instructions are moved to the cache. By 
placing data near the instructions, fewer cache accesses need 
to be made, fewer delays are incurred and the overall 
performance of applications is increased. 

However, because the instructions and data are basically 
both multi-bit data words, when instructions and data are 
interleaved it is difficult for the optimization tool to identify 
which multi-bit data words are instructions capable of being 
re-arranged. Thus, it is often difficult to optimize executables 
generated by operating systems that interleave instructions 
and data. 

SUMMARY OF THE INVENTION 

A method for permitting, software optimization tools, 
instrumenting tools and other software analysis devices to 
re -write executables having, mixed instructions and data 
uses a data structure having an entry for each multi-bit word 
in an executable file. Each entry of the data structure 
includes a number of flags that are set to identify the type of 
the multi-bit word in the associated line of the executable 
file. The types include instruction, data and unclassified. 
Each entry also includes a flag that indicates that the 
multi-bit word should not be optimized and a flag indicating 
that the multi-bit word is a branch. 

The executable code is iteratively traversed to identify 
those multi-bit words that are instructions and those multi- 
bit words that are data. After the traversal there may remain 
unclassified multi-bit words; i.e., multi-bit words which 
have not been classified yet as either data or instruction type. 
A traversal is then made of the unclassified multi-bit words. 
When an ambiguous multi-bit word is encountered, it is 
determined whether or not the unclassified multi-bit word is 
a potential branch instruction. If the multi-bit word is a 
potential branch, then the number of instructions displacing 
the potential branch instruction from the target is maintained 
as a constant. In one embodiment, a region of multi-bit 
words from the potential branch instruction to the potential 
branch destination is frozen; i.e., those multi-bit words 
within the region will have their associated no optimization 
flag set in the data structure so that the multi-bit word will 
not be optimized. Freezing the region of instruction pre- 
serves flow control and data integrity as will be described in 
more detail later herein. In an alternative embodiment, 
instructions within the region may change in order, provided 
the total displacement from the branch to the target remains 
constant. 

In addition, during the traversal of multi-bit words, when 
an ambiguous multi-bit word is encountered prior to a series 
of identified instructions, the ambiguous multi-bit word is 
marked as a branch instruction having a flow through path 
to the list of instructions. The ambiguous multi-bit word is 
marked by asserting the appropriate flag in the associated 
entry of the data structure. Thus, if the ambiguous multi-bit 
word is later moved by the optimization tool, a branch will 
be inserted by the optimization tool back to the series of 
identified instructions, thereby preserving the flow of opera- 
tions of the software application. 

Therefore, according to one aspect of the invention, a 
method for re-writing an executable comprising the steps of 
analyzing a plurality of words in the executable to classify 
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each of the plurality of words as one of either instruction, routine of the operating system 20 or alternatively may have 

data or unclassified, storing data representing a classification been provided from an external source in executable form, 

of each of the plurality of words in the executable in a data In one embodiment, also shown included in memory 18 

structure, and during optimization, selecting words of the are pre -optimizer routine 25 and optimizer routine 30. The 
executable for re-ordering responsive to the data in the data 5 pre-optimizer routine 25 and optimizer routine 30 operate 

structure together to re-write the executables 21 and 22 such that the 

iU 4 fl , . t . . performance of the associated software application on the 

According to another aspect of he invention, a computer ^ c± ^ ^ timizer of the 

readable medium having a data structure stored thereon is ^ ^ ^ stmctures 27 and 29 (labeled 

provided. The data structure comprises a plurality of entries ^ DS in nG ^ each of which to respective 
corresponding to a plurality of words in an executable file, ™ executable s 21 and 22. Although only two executable/data 

with each entry of the data structure further including a flag structure pairs are shoW n in FIG. 1 it should be understood 

for indicating whether a word in a corresponding location of tfaal a data structure may be pr0 vided for each executable 

the executable tile is an instruction and a flag for indicating lhat . g tQ be optimized Although conceptually two data 

whether a word in a corresponding location of the execut- structures are shown, in fact the same data structure may be 

able file is a data word. used, with the contents being modified for each associated 

BRIEF DESCRIPTION OF THE DRAWINGS exeoxtable Me bemg optimize^ 

Referring now to FIG. 2, a block diagram illustrating a 
Reference will now be made to the following drawings, general layout of an executable file such as executable 21 is 
wherein like numbers refer to like elements and wherein: 2Q shown. The executable file may also be referred to as an 
FIG. 1 is a block diagram illustrating a computer system image, and the terms will be used herein interchangeably. It 
in which the present invention may be utilized; should be noted that not all of the portions of the executable 

„ ■ *• « 4 . i* • 1 file will be described. For purposes of clarity, only those 

FIG. 2 illustrates a subset of the portions of a typical Ult . wm A F / , . ~f ^ 

C1 * r; portions that are helpful to understand the operation of the 

pre -optimization method of the present invention will be 

FIG. 3 is a data flow diagram for illustrating how the deS cribed. A thorough description of the organization of on 

pre-processing method of the present invention interfaces executable file is disclosed in "Microsoft Portable execut- 

with optimization software; able and Common object File Format Specification", Revi- 

FIG. 4 illustrates a data structure provided by the present s j on 5 0, October 1997, distributed by Microsoft Corpora- 
invention to identify types of multi-bit words in the execut- ^ i[ on and incorporated herein by reference, 
able file of FIG. 2, where the data structure may be provided ^ first p 0rt i on of the executable file is a header portion, 
by the pre -optimization method of FIG. 3; The header portion includes information describing the 

FIG. 5 is a flow diagram of one embodiment of the image. For example, the header portion may include infor- 

pre-optimization method of FIG. 3; mation identifying what type of machine the image may run 

FIG. 6 illustrates a control flow graph that may be 35 on, the number of sections in the image (where each section 

generated during an instruction analysis phase of the pre- includes instructions and/or data), and other characteristics 

optimization method of FIG. 5; of the image. The header of the executable file also includes 

FIG. 7 illustrates a control flow graph that may be standard fields such ^as fields identifying the state of the 

generated during a data analysis phase of the pre- "™ge, co ^e section, size of the da a uction and 

& 4 . . ( . t ? a p t?t^ c. an entry point address. The header of the executable file also 

optimization method of FIG. 5 40 ^""J / . A , . , , t • 

1 . , j r • j includes pointers to various data directories. The data direc- 

FIG. 8 is a flow diagram illustrating a method for iden- ^ ^ shown ^ ^ ^ q[ ^ image Example direc . 

tifying regions of a problem branch; and torieg indude afl export table> import table> exception table 

FIG. 9 is a flow diagram for illustrating a method of and re i OC ation table. The export table includes addresses of 
identifying and handling problem fall throughs. ^ fictions that are exported from the image for use in other 

DFSCRIPTION OF ILLUSTRATIVE imageS * 1116 imp0rt Uble kldudeS addresses of ^ions 

DESCRIPTON OF ILLUM RAJ IVL imported from other images into the image. 

EMBODIMENIS ™ , . < • . r , , , 4 

The relocation table has an entry for each absolute address 

Referring now to FIG. 1, a computer system 10 in which stored in the image. Each entry in the table includes a pointer 

the present invention may be employed is shown to include 50 t 0 the location containing the address, and a type field that 

a central processing unit (CPU) 12 coupled to a memory 18. describes how to adjust the location if the image is relocated; 

The CPU 12 may include a processor 14, such as an Alpha™ i >e ., moved to a new base address in memory 18. For 

processor, Intel processor or the like. The CPU may also example, the assumption is that most main images in the NT 

include a cache 16, which is a relatively small fast memory operating system are loaded starting at address 0x400000. If 

for storing data retrieved from the memory 18 for use by the 55 a function foo is located at address 0x402000, a pointer to 

processor 14. function foo in memory will contain the address value 

The memory 18 may be, for example, a disk device. For 0x402000, and the relocation table will contain an entry for 

purposes of illustration, software that executes on the com- the pointer. If the image is loaded at location 0x50000 

puter system 10 is shown stored in the memory 18. instead of 0x400000, the value of the pointer to the function 

During operation, as instructions and data of the software 60 foo must be updated to point to location 0x502000, and the 

are executed they are copied into the cache 16 and subse- relocation table entry describes how to do this. Thus, the 

quently forwarded to the processor 14. The software illus- relocation table will describe locations including pointers to 

trated in FIG. 1 includes an operating system 20 and one or all of the instructions in the image. When instructions are 

more executable files (shown labeled EXE) 21 and 22. The moved by the optimizer, the pointers described by the 

executable files are software applications that have been 65 relocation table need to be updated, 

translated to operate on the computer system 10. The execut- $ The image also includes a group of section headers 42, 

able files 21, 22 may have been translated by a compiler with each section header associated with one of the sections 



11/8/04, EAST version: 2.0.1.4 



US 6,324,689 Bl 

5 6 

of the image. A section of the image may be any number of A number of the flags are associated with instructions and 

pages of the image, where a page may comprise, for indicate the type of instruction of the associated longword. 

example, 8 K bytes. The section headers each include a The flags which are associated with instructions are as 

name, size of the section and virtual address of the section. follows: An Instruction flag 456 marks the longword as an 

Image pages 44 include a number of pages of instructions 5 instruction. Procedure descriptor being flag, 45e, Procedure 

and/or data apportioned into sections. descriptor end flag 45/ and Image entry 45r all indicate that 

In one embodiment, each of the instructions and data that the associated longword is an instruction that is either the 

are included in the image 21 are formed from multi-bit entry point or exit point of a procedure or image. A Bsr_ 

words. In one embodiment, 32 bits are used to decode tar g et fla g 45w > Br_target flag 45/i, Jump_target flag 45o, 

instructions and data. A 32 bit word is hereinafter referred to i° Handler target flag 45/7, and possible unknown branch target 

as longword. It should be understood that, although refer- fla gs 45w are flags that are set if the associated longword is 

ences will be made to longwords, the present invention is not an instruction that is possibly a target of a branch, jump, 

limited to any particular number of bits for instruction and exception handler, or other control transfer operation, 

data encoding. The Esym target flag 45q indicates that the associated 

According to one embodiment of the invention, a pre- 35 longword may possibly be the target of an external symbol; 

optimization tool analyzes the image and provides a data ie -» tne address of a particular longword is exported out of 

structure having, one entry for each of they longwords in the &e image. For example, in a shared library, all of the entry 

image. In general, this data structure is used to mark the type P oints are external symbols, and other images linked against 

of longword as either instruction or data. Particulars of the the shared library may jump to them. The Esym target may 

data structure are described in more detail in FIG. 4. 20 be either an instruction or data longword. 

Identifying the types of the longwords either as instruction In addition, a number of flags are associated with data and 

or data allows an optimization tool to optimize executables indicate a type, association or origin of the associated 

having mixed instruction and data format by clearly identi- longword data. For example the data flag 45c indicates 

fying those longwords that may be re -arranged. whether or not the associated longword is data. The Idata 

Referring now to FIG. 3. a data flow diagram is provided 25 arra y be S in fla g 45 & the ldata arra Y end fla S 4Sh are ^ 10 

to illustrate how the pre-optimization tool works in conjunc- mark the origin longwords and the end longwords of the 

tion with an optimization tool to re-write executable code. In arrays that make up the import table. The Rel_highiow flag 

the data flow diagram of FIG. 3, generated data is repre- 451 is used to mark longword data that contains an absolute 

sented by trapezoids while functional units are represented 3Q address. 

as blocks. At step 32 a pre-optimization tool is executed. As Other flags include the pad flag 45s, which is set to 
described above, the pre-optimization tool analyzes the indicate if the instruction is a is pad instruction (such as a 
executable to identify longwords as either Instructions or NOP), or if the data is pad data (for alignment purposes), the 
data. There may be some longwords in the executable file Undefined operation flag 45/ which identifies a longword 
which remain unclassified after analysis and some regions of 35 that has a bit pattern that does not correspond to a valid 
the executable which the pre-optimization tool has decided instruction. A problem fall through flag 4Sy is used to 
should be frozen. The details of how the pre-optimization identify problem fall through paths; i.e., unclassified long- 
tool performs its analysis are included below. Suffice it to words that are succeeded by a series of one or more 
say that after the analysis is completed, a data structure such instructions. Problem fall throughs will be described in more 
as data structure 27 is provided. 4Q detail later herein. 

At step 34, the optimization or instrumenter routine is All of the above flags may be used to fine tune the 

executed. An instrumenter tool places a counter at every re -writing operation of the optimizer. Additional flags that 

branch to analyze the branch behavior of the software may also be helpful to the specific optimization, instrumen- 

application. As described above, an optimization tool tation or analysis tool are within the scope of the invention, 

re-writes the executable to improve the performance of the 45 Referring now to FIG. 5, a flow diagram illustrating one 

software application when it is executed on the computer embodiment of a method of operating the pre-optimization 

system 10 by re-arranging the instructions in the executable. tool will now be described. At step 60, system data structures 

For example, the number of instructions executed in a stored in the executable are analyzed to identify known 

frequently executed loop may be reduced by moving some instructions and data within the image. For example, infor- 

instructions out of the loop. In addition to an optimizer and 50 ma tion in the header file identifying the entry point of the 

an instrumenter, the data structure may be used to assist executable may be used to mark the data structure entry of 

other routines that track the performance and data usage of tne associated longword as an instruction. The procedure 

the image, and therefore the present invention is not limited descriptor table includes entry point and exit point addresses 

to the use of the pre-optimization tool with any particular t o instructions of procedures; therefore, the corresponding 

application. 55 flags in the data structure are set for the longwords associ- 

Referring now to FIG. 4, one embodiment of a data ated with the addresses in the procedure descriptor table, 

structure 29 that may be generated by the pre-optimization The procedure descriptor table also includes the entry point 

tool is shown. The data structure 29 includes a number of address of each exception handler, and the corresponding 

flags 45a-45>'. Each of the flags identifies a potential char- flags in the data structure are set for the longwords associ- 

acteristic of the associated longword. A number of flags 60 ated with the entry point addresses. In addition, a longword 

indicate the operating status of the pre-optimization tool may be classified as data if it has a bit pattern that does not 

with regard to that longword. For example, Visited flag 45a correspond to a valid instruction. For valid programs, the 

indicates whether the pre-optimization tool has analyzed the targets of certain relocation types can only be instructions, 

associated longword. The flag, Jump analyzed flag 45 v and the targets of other relocation types can only be data, 

indicates that the associated longword is a jump instruction 65 Therefore, the corresponding flags in the data structure are 

and that the destination of the jump instruction has already set for the longwords that are the targets of these relocation 

been analyzed. ' types. 
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Because the entries of the import table and other system until either a longword is encountered that has already been 

tables do not contain instructions, the data structure entries analyzed (which may be determined by examining it's 

associated with these tables may be set to mark these tables visited flag) or an instruction that decodes as a return or halt 

as data. Thus, during the first stage of analysis by the instruction is encountered. When any of these stop condi- 

pre-optimization tool, all known longword types are marked 5 tions is met, analysis begins at a first instruction in a next 

in the data structure. control path, (such as one of the Identified branch target 

Following the first stage analysis, many of the longwords paths). This process continues until all of the longwords 

will remain unclassified. At step 61, the pre -optimization nave visited. 

tool begins a flow analysis for the executable file. A flow n f , . ( CTr . * ui u u 

, ° * j , _:• . it _ . j „ . Referring back to FIG. 5, once the executable has been 

graph is generated, starting at the entry point and walking J0 anal d « mark longwords „ ^ructions, it is again 

forward along all control paths. Each time a longword is i j 4 . i a a * A * * J r * 

j . i i r *u t ,n u v analyzed to mark longwords as data. An attempt is made to 

encountered in the traversal of the control flow graph, its ., an ;. fti , _ „ • „ p rtf Anr „ u * _l 

ww* » . AB . . j *. identify larger contiguous regions or data. For each long- 

Visited flag 45a is set. Because an assumption is made that a *t * - a , a a j * t * * ♦ *n r 

it _ *=\ , t . • * u word mat *s identified as a data element at step 60 a linear 

the executable operates correctly, once an instruction has , t , . \* r j * 

u a *-g a i .u a a • ♦ *• • • * backwards scan (decreasmg addresses) is performed to 

been identified, unless the identified instruction is an instruc- 1i: , c , v , . , & , * . i r •« j 

it _ o /. i . , * search for a longword that has already been classified as an 

ion which totally changes the flow of code, it can be assured instruc(ion of a t ODgword that has an encoding of a control 

that the longword immediately following the instruction transfer instruction B Each i ongword encountered in the back- 

must also be an instruction. Thus, until the flow of instruc- , * *u*u *iju i a a 

. 4 JL 4 . , , t . , t ward scan operation that has not already been classified and 

toons is stopped by a return, jump, branch or other control ^ ^ a transfer instruction has its corresponding 

transfer operation it is assumed that the instructions follow 20 Da(a flag 4gc m ^ da(a structure ^ Because ft js assumed 

a a roug pa ^ ^ p rogram 0 p era tes normally, once a data region has 

For each longword which is identified as an instruction beeQ identified( it can be assured that the instruct ion 
because it is encountered in a faU though path and which sequ&ncG will not faU through and start executing the data, 
decodes as a control transfer instruction (jump or branch) the ^ a control transfer operation must occur before data is 
targets of the control transfer instruction are determined. 25 cncountered> backwards search for the control transfer 
When the target longwords are determined, the Instruction instruct ion may therefore be used to identify regions of data, 
flags 456 are set in the corresponding data structure en tries. „ _ . _ 
AAA *- ii <■ *u u u * ♦ a je™ j as „ For example, referring now to FIG. 7, assume that long- 
Additionafly, one of the branch target nags 45/n and 45/j or , M ^ ' ° , A I to 
t fl 45o ar set word 92 was identified at step 60 as data. First, a linear 
jump arge ag o are se . backwards analysis of each of longwords 94, 96 and 98 
A portion of an example flow diagram that couldbe built 30 indicates , hat n ' one of these lon ^, ord Qas either been 
by analyzing the executable 21 is shown in FIG. 6. TJe entry classifled (ag eitnef instructioils or data) and are not cont rol 
point from the header file indicates that longword 70 is the v erations Therefore, the Data ' flags in the entries of 
first instruction in the un age, in this example a Load the data structure corresponding to the longwords 94, 96 and 
jnstruction, where a value X is loaded into register Rl. 9g m sei However, once longword 99 (a control transfer 
Starting from longword 70, the next longword in the 35 0 tio n) is encountered, the backwards analysis of this 
sequence of the executable is retrieved. Because it is - s £ ecifi th is ^piete. This analysis is performed for each 
assumed that the software application operates correctly, ^ ^ .^.^ as data ^ * 6Q 
longword 72 is assumed to be an instruction and the Instruc- 
tion flag in the data structure entry corresponding to long- After the backwards linear analysis has completed, for 
word 72 is set. For purposes of illustration, longword 72 4 Q each of the longwords identified as data at step 60, a forward 
decodes to a Load instruction, and a value Y is loaded into traversal of the executable is performed. If the successor 
register R2. Because the Load instruction is not a control longword has not been classified, the successor longword is 
transfer instruction, longword 74 must also be an instruction, classified as data. The linear forward scan continues until a 
and the Instruction flag in the data structure entry corre- longword is encountered that has already been classified or 
spending to longword 74 is set. In this example, longword 45 until a. 10 "^ 0 ^ * at * a possible target of a control transfer 
74 decodes to an Add. Because the Add does not change operation is identified. The linear forward scan continues for 
control flow, longword 76 is identified as an instruction. each data ltem untl1 a11 the originally identified data ele- 
Because longword 74 is not a control transfer instruction, ments have beeD visited Whea al1 of the data dements have 
longword 76 is an instruction, and its corresponding Instruc- been visited for backwards and forwards linear scan, the data 
tion flat, is set in the data structure. In this example, 50 P ortlon of the analysis 15 ^P 1 ^- 

longword 76 is a branch instruction, BNE R3, which con- Referring back again to FIG. 5, once the data marking of 

ditionally changes the direction of flow depending upon the step 62 is complete, at step 63 it is determined whether to 

state of the register R3 to a target destination. The target re-execute the instruction marking and data marking steps 

destination is retrieved from the branch instruction 76, and 61 and 62, respectively. Each pass of the instruction and data 

the longword 84 according to the instruction has its Instruc- 55 marking steps provides information that is helpful to resolve 

tion flag and branch target flag set in the data structure entry. the classification of unclassified longwords. Therefore, it is 

In addition, because the branch instruction 76 has a flow advantageous to iteratively execute the mark instruction and 

through path, longword 78 is identified as an instruction. In mark data steps until the number of longwords remaining 

this example, longword 78 is an absolute Jump instruction, unclassified becomes relatively constant. Thus, at step 63, a 

which changes the flow of control for the program. The 60 decision will be made to repeat steps 61 and 62 until the 

target of the jump is determined and longword 88 is marked number of longwords classified as instructions, the number 

as an instruction. of longwords classified as data and the number of longwords 

llius, no inferences may be made at this point as to unclassified are relatively constant on each pass, 
whether longwords 80 and 82 are instructions or data. At this At step 64, all of the unclassified longwords are decoded 
point, tracing of the first control path (instructions 70-78) 65 to identify problem branches. A problem branch is a long- 
continues along the pathway starting at longword 88 (the word that has an opcode portion that decodes to a branch 
target of the Jump). The control path continues to be traced opcode, yet the longword has not been classified as an 
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instruction or as data. It could be that the longword in fact 108a the new target address is stored as the End of Region 

is an instruction. However, it could also be that the longword address, and all the longwords between the old End of 

is merely data having a bit pattern corresponding to a branch Region and the new End of Region, including the new End 

instruction. of Region, are pushed onto the WORK stack. The process 

Problem branches pose a difficulty because no assumption 5 men proceeds to step 109. 

can safely be made as to the classification of the longword. If at step 106 it was determined that the problem branch 

If it is assumed that the longword is data when in fact the is a backwards branch, then at step 107b the new target is 

longword is a branch, the flow of instructions from the compared against the Start of Region address. If the new 

branch may be altered by the optimization tool, causing the target has an address that is sequentially less than the Start 

software application to execute inaccurately. For example, 10 of Region address, at step 1086 the new target address is 

assume that the longword includes a number of bits that stored as the Start of Region address, and all the longwords 

encode as a branch and a number of bits that encode as the between the old Start of Region and the new Start of Region, 

offset target for the branch. If it is assumed that the longword including the new Start of Region, are pushed onto the 

is merely data, it the optimization tool later moves either the WORK stack. The process then proceeds to step 109. Thus, 

branch or the branch target, the offset field of the branch 15 the size of the region is extended to cover the displacements 

instruction will not be updated. As a result the software of each problem branch encountered between the original 

application will execute erroneously. £ branch instruction and the end of the region. Depending 

If it is assumed that the longword is a branch, when^in fact u P on the displacements of the branches that are encountered, 

the longword is data, then it will be assumed that the lower t the re S ioa ma y S row in both the positive and negative 

data bits indicate an offset or target. If the offset or target is 201 directions. 

moved, the lower data bits of the longword will be modified. At step 109, it is determined whether the WORK stack is 

If in fact the longword is data, this modification will change empty. If not, the process returns to step 103 where the next 

data and adversely affect the software application. instruction is decoded. Once the WORK stack is empty, the 

Therefore, in order to ensure that problem branches do not 2$ process of defining the region affected by the problem 

get optimized, the region between the branch instruction and branch is complete. 

the target(s) of the branch are frozen; i.e., the entire region The result of the problem branch analysis is therefore one 

may be moved by the optimizer but no individual instruc- or more regions of longwords, each with their no-optimize 

tions within the region that could effect the branch instruc- flag set. When the optimizer attempts to re-write the 

tion and target offset should be re-arranged. By freezing this executable, the identified regions are essentially frozen 

region it can be assured that the displacement between the together, they may be moved as a unit but instructions and/or 

branch instruction and the branch target remains constant data may not be removed from the region. Thus, the flags of 

and that the contents of ambiguous longwords are not the data structure may be used to preclude optimization in 

modified. order to preserve code flow and data integrity. 

Referring now to FIG. 8, a method for marking a problem 35 In an alternative embodiment, a method used to mark 
branch region is shown. At step 100, the ambiguous long- problem branch regions includes the steps of marking all 
word is decoded to extract a potential branch displacement problem branch regions (i.e, the problem branch and target) 
value. The displacement value may be positive, indicating a by setting the corresponding flags in the data structure, 
forward branch or negative, indicating a backward branch. transitively merging all overlapping ranges to provide a set 
Also at step 100, the problem branch flag 45* is set in the 40 of non-overlapping problem branch ranges, and then walk- 
entry of the data structure corresponding to the ambiguous ing over each problem branch range and setting the 
longword. no_optimize flag for each longword encountered within 

At step 102, the displacement value is added to (or eacn ran S e - 
subtracted from) the address of the ambiguous longword to In another alternative embodiment, rather than freezing 
determine the address of the target of the branch. The 45 the entire region, information about the Start of region 
address of the ambiguous longword and the address of the address, End of Region address, problem branch sources and 
branch target are compared, the smaller is saved as the Start problem branch targets may be used to ensure that a constant 
of Region, and the larger is saved as the End of Region. All displacement is maintained between the problem branches 
the longwords between the Start or Region and End of and their targets. With such an arrangement, optimization of 
Region, including the two endpoints, are pushed on stack 50 instructions within the regions would be permissible pro- 
which we call WORK. Also at step 102, the possible vided displacements remained fixed, 
unknown branch target flag 45m and the no -optimize flag 45a* The data structure may also be used to mark problem fall 
are set in the entry of the data structure corresponding to the throughs to ensure that appropriate steps are taken to pre- 
calculated target longword. At step 103, a longword is serve instruction flow. A problem fall through is an unci as- 
popped off the top of the WORK stack. The longword is 55 sified longword that precedes a block of known instructions 
decoded, and its no-optimize flag 45d is set. At step 104 it and that does not have the encoding of a unconditional 
is determined whether or not this longword is a problem control transfer instruction. If the ambiguous longword is an 
branch. If this longword is not a problem branch, the process instruction, then, when the program is executed the program 
the proceeds to step 109. If this longword is a problem control may fall through to execute the block of instructions, 
branch, then at step 105 the target is calculated, and the flags However, because the longword is ambiguous, there must be 
are updated. some way to indicate to the optimizer to maintain the 

At step 106, it is determined whether the displacement is program control flow, 

positive (indicating a forward branch) or negative Referring now to FIG. 9, a flow diagram for illustrating 

(indicating a backwards branch). If it is a forward branch, at one embodiment of a method of handling problem fall 

step 107a the new target is compared against the End of 65 through instructions will now be described. At step 200, 

Region address. If the new target has an address that is each unclassified longword that is not the encoding of an 

sequentially greater than the End of Region address at step unconditional transfer of control and that precedes a series 
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of one or more instructions is identified as a problem fall 
through by setting the problem_fall_through flag 45y in its 
associated entry of the data structure. At step 202, a table is 
built that contains, for each problem fall through, an instruc- 
tion object that uniquely identifies the problem fall through 
and an associated instruction object that uniquely identifies 
the following instruction (i.e., the fall through target). 

At step 204 optimization occurs, and instructions are 
moved in the executable. At step 206, following 
optimization, the table is scanned to determine whether each 
problem fall through still immediately precedes the target. If 
the problem fall through does not immediately precede the 
branch, then at step 208 a branch instruction is inserted 
immediately following the problem fall through, with the 
branch destination pointing to the address of the fall through 
target. If it turns out that the ambiguous longword was data, 
the branch instruction will never be executed (since data is 
not executed). 

Accordingly a method has been provided for pre- 
optimizing an executable file to prepare the executable file 
for manipulation by an optimizer, an instrumenter or another 
software tool. By providing a data structure that identifies 
characteristics of the associated executable, the same opti- 
mization tool may be run a variety of different executable 
file, regardless of their internal structure. J 

Having described various embodiments of the invention, 
it will now become apparent to one of skill in the art that 
other embodiments incorporating its concepts may be used. 
It is felt, therefore, that this invention should not be limited 
to the disclosed embodiment, but rather should be limited 
only by the spirit and scope of the appended claims. 

What is claimed is: 

1. A method for re -writing an executable comprising the 
steps of: 

analyzing a plurality of words in the executable to classify 
each of the plurality of words as one of either 
instruction, data or unclassified, wherein the step of 
analyzing includes the step of classifying one of the 
plurality of words as a problem branch by comparing a 
plurality of bits of the one of the plurality of words 
against a set of branch opcodes to determine a match, 
and, responsive to a match between one of the set of 
opcodes and the plurality of bits, classifying the one of 
the plurality of words as a problem branch; 

storing data representing a classification of each of the 
plurality of words in the executable in a data structure; 
and 

during re-writing, selecting words of the executable for 
re-ordering responsive to the data in the data structure. 

2. The method according to claim 1, wherein said step of 
analyzing includes the step of classifying a subset of the 
plurality of words responsive to data in the executable. 

3. The method according to claim 1, wherein the data 
structure includes a plurality of entries corresponding to the 
plurality of words in the executable, each entry comprising 
an instruction flag for indicating whether the associated one 
of the plurality of words is an instruction, and wherein the 
step of analyzing includes the step of classifying one of the 
plurality of words as an instruction by asserting the instruc- 
tion flag in the corresponding entry of the data structure. 

4. The method according to step 3, wherein the step of 
analyzing includes the step of analyzing the plurality of 
words to identify instructions by: 

selecting a known instruction; 

responsive to the known instruction not being a control 
flow transfer, fetching a next instruction in a sequence 
of the plurality of words relative to the known instruc- 
tion; 



classifying the next instruction as an instruction by assert- 
ing the instruction flag in the corresponding entry of the 
data structure; and 
repeating the steps of fetching and classifying until the 
next instruction is a control transfer instruction. 

5. The method according to step 4 further comprising the 
step of: 

decoding the next instruction to determine whether the 

instruction is a conditional branch instruction; 
responsive to the next instruction being a conditional 
branch instruction, locating each of the target words of 
the branch instruction from the plurality of words; and 
classifying each of the target words as instructions by 
asserting the instruction flags in the corresponding 
entries of the data structure. 

6. The method according to step 4 further comprising the 
step of: 

decoding the next instruction to determine whether the 
instruction is a jump instruction or an unconditional 
branch instruction; 
responsive to the next instruction being a jump instruction 
or an unconditional branch instruction, locating the 
target of the jump instruction or the unconditional 
branch instruction in the plurality of words; and 
classifying the target of the jump instruction or an uncon- 
ditional branch instruction as an instruction by assert- 
ing the instruction flag in the corresponding entry of the 
data structure. 

7. 'ITie method according to claim 1, wherein the data 
structure includes a plurality of entries corresponding to the 
plurality of words in the executable, each entry comprising 
a data flag for indicating whether the associated one of the 
plurality of words is a data word, and wherein the step of 
analyzing includes the step of classifying one of the plurality 
of words as a data word by asserting the data flag in the 
corresponding entry of the data structure. 

8. The method according to claim 7, wherein the step of 
analyzing further includes the step of analyzing the plurality 
of words to identify data words by: 

selecting a known data word; 

fetching a preceding word in the plurality of words 

relative to the known data word; 
responsive to the preceding not being classified and not 
decoding as a control transfer instruction, classifying 
the preceding word as a data word by asserting the data 
flag in the corresponding entry of the data structure; 
and 

repeating the steps of fetching and classifying until the 
preceding word is classified or decodes as a control 
transfer instruction. 

9. The method according to claim 7, wherein the step of 
analyzing further includes the step of analyzing the plurality 

55 of words to identify data words by: 
selecting a known data word; 

fetching a succeeding word in the plurality of words 

relative to the known data word; 
responsive to the succeeding word not being classified 
and not decoding as a possible control transfer target, 
classifying the succeeding word as a data word by 
asserting the data flag in the corresponding entry of the 
data structure; and 
repeating the steps of fetching and classifying until the 
preceding word is classified or decodes as a possible 
control transfer target. 
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10. The method according to claim 1, wherein the data 
structure includes a plurality of entries corresponding to the 
plurality of words in the executable, each entry comprising 
a problem branch flag for indicating whether the associated 
one of the plurality of words is a problem branch, and 5 
wherein the step of classifying one of the plurality of words 
as a problem branch includes the step of asserting the 
problem branch flag in the corresponding entry of the data 
structure. 

11. The method according to claim 10 wherein the step of 10 
analyzing further includes the steps of: 

for each one of the plurality of words classified as a 
problem branch: 

identifying the target of the problem branch; and 
during re-writing allowing longwords in the executable 
to be rearranged such that the number of longwords 
displacing the problem branch from the target is 
maintained as a constant. 

12. The method according to claim 10 wherein the entry 

of the data structure further includes a no-optimize flag for 20 
indicating that the one of the plurality of words associated 
with the entry should not be optimized, wherein step of 
analyzing further includes the steps of: 
for each one of the plurality of words classified as a 
problem branch: 

identifying a region of longwords associated with the 
problem branch; and 

freezing the order of longwords in the region identified 
as associated with the problem branch by setting the 
no_optimize flag in the data structure entry associ- 
ated with each of the longwords in the region. 

13. The method of claim 12, wherein the step ofc identi- 
fying the region further comprises the steps of, for each one 
of the plurality of words classified as a problem branch: 

setting the problem branch as a start of region; 
calculating a target for the problem branch; 
designating the target as an end of region; and 
fetching each one of the plurality of words between the 
end of region and the problem branch. 

14. The method according to claim 13 wherein the step of 
identifying the region further includes the steps of, for each 
one of the plurality of words classified as a problem branch: 

decoding each one of the plurality fetched words to 
determine if the fetched word is a problem branch; 

responsive to the fetched word being a problem branch, 
determining whether the problem branch is a forward 
branch or a backward branch; 

responsive to the problem branch being a forward branch, 
comparing the target of the fetched problem branch 
against the end of region address and conditionally 
setting the end of region to equal the target of the 
fetched problem branch responsive to the fetched target 
branch being greater than the end of region; and 

responsive to the problem branch being a backward 
branch, comparing the target of the fetched problem 
branch against the start of region address and condi- 
tionally setting the start of region to equal the target of 
the fetched problem branch responsive to the fetched 
target branch being less than the end of region. 

15. The method according to claim 12, wherein the step 
of freezing further comprises the step of: 
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asserting the no_optimize flag in the entry of the data 
structure corresponding to each longword in the region. 

16. The method according to claim 1, wherein the data 
structure includes a plurality of entries corresponding to the 
plurality of words in the executable, each entry comprising 
a problem fall through flag for indicating whether the 
associated one of the plurality of words is a problem fall 
through, and wherein the step of analyzing includes the step 
of identifying problem fall throughs and setting the problem 
fall through flag to mark a possible fall through instruction 
path. 

17. The method according to claim 16, wherein the step 
of identifying problem fall throughs further comprises the 
steps of: 

for each unclassified longword of the plurality of words 
that is followed by an instruction: 
setting the problem fall through flag in the entry of the 

data structure corresponding to each unclassified 

longword; and 
storing, in a table, an object that uniquely identifies the 

first unclassified longword and a second object that 

uniquely identifies the instruction following the 

unclassified longword. 

18. The method according to claim 17, further comprising 
the step of, during re-writing: 

for each pair of objects in the table, comparing the 
updated address of the first object with the updated 
address of the second object; 

responsive to the updated address of the second object not 
being the immediate successor of the updated address 
of the first object, inserting a branch instruction after 
the unclassified longword, the branch instruction hav- 
ing a target address corresponding to the updated 
second address in the table entry associated with the 
unclassified longword. 

19. A computer readable medium having a data structure 
stored thereon, the data structure comprising a plurality of 
entries corresponding to a plurality of words in an execut- 
able file*, with each entry of the data structure further 
comprising: 

a flag for indicating whether a word in a corresponding 
location of the executable file is an instruction; 

a flag for indicating whether a word in a corresponding 
location of the executable file is a problem fall through; 
and 

a flag for indicating whether a word in a corresponding 
location of the executable file is a data word. 

20. The computer readable medium of claim 19, wherein 
each entry of the data structure further comprises: 

a flag for indicating whether a word in a corresponding 
location of the executable file is a problem branch. 

21. The computer readable medium of claim 19, wherein 
each entry of the data structure further comprises: 

a flag for indicating whether a word in a corresponding 
location of the executable file should not be optimized. 

22. The computer readable medium of claim 21, wherein 
each entry of the data structure further comprises: 

a flag for indicating whether a word in a corresponding 
location of the executable file is a problem fall through. 
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